Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal\nin speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features\nfor the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients\n(MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design.\nTo randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which\ndistinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between\nvoiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase\nfeatures are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision.\nThe synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of\nresynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the\nsuggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some\nlimitations of the harmonic framework itself in the case of high fundamental frequencies.
Loading....